Active Learning of Linear Separators

نویسنده

  • Maria-Florina Balcan
چکیده

We focus on binary classification problems; that is, we consider the problem of predicting a binary label y based on its corresponding input vector x. As in the standard machine learning formulation, we assume that the data points (x, y) are drawn from an unknown underlying distribution DXY over X × Y ; X is called the instance space and Y is the label space. We assume that Y = {±1} and X = Rd; we also denote the marginal distribution over X by D. Let C be the class of linear separators through the origin, that is C = {sign(w · x) : w ∈ Rd, ||w|| = 1}. To keep the notation simple, we sometimes refer to a weight vector and the linear classifier with that weight vector interchangeably. Our goal is to output a hypothesis function w ∈ C of small error, where err(w) = errDXY (w) = P(x,y)∼DXY [sign(w · x) 6= y]. Recall that in (pool-based) active learning, a set of labeled examples (x1, y1) . . . (xm, ym) is drawn i.i.d. from DXY ; the learning algorithm is permitted direct access to the sequence of xi values (unlabeled data points), but has to make a label request to obtain the label yi of example xi. The hope is that we can output a classifier of small error by using many fewer label requests than in passive learning by actively directing the queries to informative examples (while keeping the number of unlabeled examples polynomial). For added generality, we also consider the selective sampling active learning model, where the algorithm visits the unlabeled data points xi in sequence, and, for each i, makes a decision on whether or not to request the label yi based only on the previouslyobserved xj values (j ≤ i) and corresponding requested labels, and never changes this decision once made. Our upper and lower bounds will apply to both selective sampling and pool-based active learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning Models and Noise

I study active learning in general pool-based active learning models as well noisy active learning algorithms and then compare them for the class of linear separators under the uniform distribution.

متن کامل

Active and passive learning of linear separators under log-concave distributions

We provide new results concerning label efficient, polynomial time, passive and active learning of linear separators. We prove that active learning provides an exponential improvement over PAC (passive) learning of homogeneous linear separators under nearly log-concave distributions. Building on this, we provide a computationally efficient PAC algorithm with optimal (up to a constant factor) sa...

متن کامل

Efficient Learning of Linear Separators under Bounded Noise

We study the learnability of linear separators in < in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example xwith probability η(x) ≤ η. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the unif...

متن کامل

Asymptotic Active Learning

We describe and analyze a PAC-asymptotic model for active learning. We show that in many cases where it has traditionally been believed that active learning does not help, active learning does help asymptotically. This view contrasts sharply with the traditional 1/2 lower bounds for active learning classes such as non-homogeneous linear separators under the uniform distribution or unions of k i...

متن کامل

The Power of Localization for Efficiently Learning Linear Separators with Malicious Noise

In this paper we put forward new techniques for designing efficient algorithms for learning linear separators in the challenging malicious noise model, where an adversary may corrupt both the labels and the feature part of an η fraction of the examples. Our main result is a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution that can handle a noise rate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015